Statistical Test-Based Evolutionary Segmentation of Yeast Genome
نویسندگان
چکیده
Segmentation algorithms emerge observing fluctuations of DNA sequences in alternative homogeneous domains, which are named segments [1]. The key idea is that two genes that are controlled by a single regulatory system should have similar expression patterns in any data set. In this work, we present a new approach based on Evolutionary Algorithms (EAs) that differentiate segments of genes, which are represented by its level of meiotic recombination1. We have tested the algorithm with the yeast genome [2][3] because this organism is very interesting for the research community, as it preserves many biological properties from more complex organisms and it is simple enough to run experiments. We have a file with about 6100 genes, divided into sixteen yeast chromosomes (N). Each gene is a row of the file. Each column of file represents a genomic characteristic under specific conditions (in this case, only the activity of meiotic recombination). The goal is to group consecutive genes properly differentiated from adjacent segments. Each group will be a segment of genes, as it will maintain the physical location within the genome. To measure the relevance of segments the Mann–Whitney statistical test has been used. Each individual of the population is an array of natural numbers with size C, and it represents a collection of cutpoints within the yeast genome. Fifteen of these cutpoints correspond to the boundaries of the sixteen chromosomes of the yeast genome, and they are permanent. The sixteen cutpoints corresponding to centromeres also are permanent, so we have 31 constant cutpoints. The centromere is approximately in the middle of a chromosome and separates it in two branches (L and R). Although these fixed cutpoints (FC=31) cannot be moved, they have been included in all of the individuals, making easier the computational process. For example, if a cutpoint array includes the values 34, 57, 7, 25 and 80, it means that there is a cutpoint between the 34 and the 35 genes, between the 57 and the 58 genes, between the 7 and the 8 genes, etc. We have chosen the Mann-Whitney test as the fitness function. The Mann–Whitney test, also known as the Wilcoxon rank sum test, is a non–parametric test used to test for difference between the medians of two independent groups. This test is the non–parametric equivalent of the two–sample t–test. No distributional assumptions are required for this test, so the test does not assume that the populations follow Gaussian distributions. The choice of this method is due to the necessity of differentiating adjacent segments clearly. If we choose the mean as
منابع مشابه
The effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment
The present study was conducted with the aim of the effects of segmentation and redundancy methods on cognitive load and vocabulary learning and comprehension of English lessons in a multimedia learning environment.The purpose of this study is an applied research and a real experimental study. The statistical population of the present study includes all people aged 14 to 16 who are enrolled in ...
متن کاملPii: S0378-1119(01)00672-2
The concept of homogeneity of G 1 C content is always relative and subjective. This point is emphasized and quantified in this paper using a simple example of one sequence segmented into two subsequences. Whether the sequence is homogeneous or not can be answered by whether the two-subsequence model describes the DNA sequence better than the one-sequence model. There are at least three equivale...
متن کاملRevealing the structure and dynamics of cis-regulation using heterogeneous, genome-wide, multi-species data.
Keywords: gene networks Gene regulation is based on interactions between transcription factors and their DNA binding sites. We report on three studies on the structure and dynamics of cis-regulation. (1) By studying the rate of changes of motifs in promoters of four yeast genomes, we provide a first global view of the selection forces acting in the evolution of binding sites. Our analysis [2] s...
متن کاملA Statistical Deformable Model for the Segmentation of Liver CT Volumes Using Extended Training Data
We present a fully automated method based on an evolutionary algorithm, a statistical shape model (SSM), and a deformable mesh to tackle the liver segmentation task of the MICCAI Grand Challenge workshop. To model the expected shape and appearance, the SSM is trained on 35 training datasets. Segmentation is started by a global search with the evolutionary algorithm, which provides the initial p...
متن کاملPareto-based Multi-criteria Evolutionary Algorithm for Parallel Machines Scheduling Problem with Sequence-dependent Setup Times
This paper addresses an unrelated multi-machine scheduling problem with sequence-dependent setup time, release date and processing set restriction to minimize the sum of weighted earliness/tardiness penalties and the sum of completion times, which is known to be NP-hard. A Mixed Integer Programming (MIP) model is proposed to formulate the considered multi-criteria problem. Also, to solve the mo...
متن کامل